Search CORE

26 research outputs found

Large Scale Question Paraphrase Retrieval with Smoothed Deep Metric Learning

Author: Bonadiman Daniele
Kumar Anjishnu
Mittal Arpit
Publication venue
Publication date: 01/01/2019
Field of study

The goal of a Question Paraphrase Retrieval (QPR) system is to retrieve equivalent questions that result in the same answer as the original question. Such a system can be used to understand and answer rare and noisy reformulations of common questions by mapping them to a set of canonical forms. This has large-scale applications for community Question Answering (cQA) and open-domain spoken language question answering systems. In this paper we describe a new QPR system implemented as a Neural Information Retrieval (NIR) system consisting of a neural network sentence encoder and an approximate k-Nearest Neighbour index for efficient vector retrieval. We also describe our mechanism to generate an annotated dataset for question paraphrase retrieval experiments automatically from question-answer logs via distant supervision. We show that the standard loss function in NIR, triplet loss, does not perform well with noisy labels. We propose smoothed deep metric loss (SDML) and with our experiments on two QPR datasets we show that it significantly outperforms triplet loss in the noisy label setting

arXiv.org e-Print Archive

Crossref

Effective shared representations with Multitask Learning for Community Question Answering.

Author: Alessandro Moschitti
Antonio Uva
Daniele Bonadiman
Publication venue
Publication date: 01/01/2017
Field of study

Crossref

Open Access Repository

Ranking Kernels for Structures and Embeddings: A Hybrid Preference and Classification Model

Author: Alessandro Moschitti
Daniele Bonadiman
Kateryna Tymoshenko
Publication venue
Publication date: 01/01/2017
Field of study

Crossref

Open Access Repository

Knowledge-driven slot constraints for goal-oriented dialogue systems

Author: Daniele Bonadiman
Piyawat Lertvittayakumjorn
Saab Mansour
Publication venue
Publication date: 01/01/2021
Field of study

In goal-oriented dialogue systems, users provide information through slot values to achieve specific goals. Practically, some combinations of slot values can be invalid according to external knowledge. For example, a combination of "cheese pizza" (a menu item) and "oreo cookies" (a topping) from an input utterance "Can I order a cheese pizza with oreo cookies on top?" exemplifies such invalid combinations according to the menu of a restaurant business. Traditional dialogue systems allow execution of validation rules as a post-processing step after slots have been filled which can lead to error accumulation. In this paper, we formalize knowledge-driven slot constraints and present a new task of constraint violation detection accompanied with benchmarking data. Then, we propose methods to integrate the external knowledge into the system and model constraint violation detection as an end-to-end classification task and compare it to the traditional rule-based pipeline approach. Experiments on two domains of the MultiDoGO dataset reveal challenges of constraint violation detection and sets the stage for future work and improvements

Open Access Repository

Multitask Learning with Deep Neural Networks for Community Question Answering

Author: Bonadiman Daniele
Moschitti Alessandro
Uva Antonio
Publication venue: 'OpenEdition'
Publication date: 15/12/2020
Field of study

In this paper, we developed a deep neural network (DNN) that learns to solve simultaneously the three tasks of the cQA challenge proposed by the SemEval-2016 Task 3, i.e., question-comment similarity, question-question similarity and new question-comment similarity. The latter is the main task, which can exploit the previous two for achieving better results. Our DNN is trained jointly on all the three cQA tasks and learns to encode questions and comments into a single vector representation shared across the multiple tasks. The results on the official challenge test set show that our approach produces higher accuracy and faster convergence rates than the individual neural networks. Additionally, our method, which does not use any manual feature engineering, approaches the state of the art established with methods that make heavy use of it

OpenEdition

Recurrent Context Window Networks for Italian Named Entity Recognizer

Author: Bonadiman Daniele
Moschitti Alessandro
Severyn Aliaksei
Publication venue: 'OpenEdition'
Publication date: 15/12/2020
Field of study

In this paper, we introduce a Deep Neural Network (DNN) for engineering Named Entity Recognizers (NERs) in Italian. Our network uses a sliding window of word contexts to predict tags. It relies on a simple word-level log-likelihood as a cost function and uses a new recurrent feedback mechanism to ensure that the dependencies between the output tags are properly modeled. These choices make our network simple and computationally efficient. Unlike previous best NERs for Italian, our model does not require manual-designed features, external parsers or additional resources. The evaluation on the Evalita 2009 benchmark shows that our DNN performs on par with the best NERs, outperforming the state of the art when gazetteer features are used

OpenEdition

Neural sentiment analysis for a real-world application

Author: Bonadiman Daniele
Castellucci Giuseppe
Favalli Andrea
Moschitti Alessandro
Romagnoli Raniero
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we describe our neural network models for a commercial application on sentiment analysis. Different from academic work, which is oriented towards complex networks for achieving a marginal improvement, real scenarios require flexible and efficient neural models. The possibility to use the same models on different domains and languages plays an important role in the selection of the most appropriate architecture. We found that a small modification of the state-of-the-art network according to academic benchmarks led to a flexible neural model that also preserves high accuracy. In questo lavoro, descriviamo i nostri modelli di reti neurali per un'applicazione commerciale basata sul sentiment analysis. A differenza del mondo accademico, dove la ricerca è orientata verso reti anche complesse per il raggiungimento di un miglioramento marginale, gli scenari di utilizzo reali richiedono modelli neurali flessibili, efficienti e semplici. La possibilitá di utilizzare gli stessi modelli per domini e linguaggi variegati svolge un ruolo importante nella scelta dell'architettura. Abbiamo scoperto che una piccola modifica della rete allo stato dell'arte rispetto ai benchmarks accademici produce un modello neurale flessibile che preserva anche un'elevata precisione

Crossref

OpenEdition

Open Access Repository

DeAL: Decoding-time Alignment for Large Language Models

Author: Bonadiman Daniele
Gupta Arshit
Huang James Y.
Kirchhoff Katrin
Lai Yi-an
Mansour Saab
Pappas Nikolaos
Roth Dan
Sengupta Sailik
Publication venue
Publication date: 20/02/2024
Field of study

Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF). However, it is unclear if such methods are an effective choice to teach alignment objectives to the model. First, the inability to incorporate multiple, custom rewards and reliance on a model developer's view of universal and static principles are key limitations. Second, the residual gaps in model training and the reliability of such approaches are also questionable (e.g. susceptibility to jail-breaking even after safety training). To address these, we propose DeAL, a framework that allows the user to customize reward functions and enables Decoding-time Alignment of LLMs (DeAL). At its core, we view decoding as a heuristic-guided search process and facilitate the use of a wide variety of alignment objectives. Our experiments with programmatic constraints such as keyword and length constraints (studied widely in the pre-LLM era) and abstract objectives such as harmlessness and helpfulness (proposed in the post-LLM era) show that we can DeAL with fine-grained trade-offs, improve adherence to alignment objectives, and address residual gaps in LLMs. Lastly, while DeAL can be effectively paired with RLHF and prompting techniques, its generality makes decoding slower, an optimization we leave for future work.Comment: The appendix contains data that is offensive / disturbing in natur

arXiv.org e-Print Archive